knitr::opts_chunk$set(
warning = FALSE, message = FALSE,echo = TRUE, results = "hide", tidy = TRUE)
Package references used in A1 (Morgan 2021; Zhu et al. 2008; Chen, Lun, and Smyth 2016; Xie 2021; Wickham et al. 2019; Durinck et al. 2009; Davis and Meltzer 2007; Xie, Cheng, and Tan 2022; Isserlin 2022c, 2022a)
Platform title:Illumina HiSeq 2000 (Homo sapiens)
Submission data:Nov 02 2010
Last update data:Mar 27 2019
Organism:Mar 27 2019
Number of GEO datasets that use this technology:9316
Number of GEO samples that use this technology: 160195
# get supplementary file
seq_files = getGEOSuppFiles("GSE152201")
file_name = rownames(seq_files)
file_name
exp_data = read.delim(file_name[1], header = TRUE, check.names = FALSE)
head(exp_data)
My dataset contains data of two cell lines, I am using the data from the MDA cell line
# My dataset contains data of two cell lines, I am using the data from the MDA
# cell line
DT::datatable(data.frame(exp_data[1:5, c(1, 2, 9:14)]), options = list(scrollX = T),
caption = "Table 1 :My data sample")
# select the mda cell line data
exp_data_mda <- exp_data[, c(1, 2, 9:14)]
# converts it to count per million
count_per_million_mda = cpm(exp_data[, 3:8])
# change the row name
rownames(count_per_million_mda) <- exp_data_mda[, "gene_id"]
# keep cpm greater than 3 per row
keep = rowSums(count_per_million_mda > 1) >= 3
exp_data_filtered = exp_data_mda[keep, ]
summarised_gene_count <- sort(table(exp_data_filtered$gene_id), decreasing = TRUE)
# sort it in decreasing order
summarised_gene_count_unfiltered <- sort(table(exp_data_mda$gene_id), decreasing = TRUE)
# show an example
knitr::kable(summarised_gene_count[which(summarised_gene_count > 1)][1:6], format = "html")
dim(exp_data_mda)
# how many genes are keptn
low_count_filtered <- nrow(exp_data_filtered)
# filter outlier using normalization
filtered_data_matrix <- as.matrix(exp_data_filtered[, 3:8])
# change the row names
rownames(filtered_data_matrix) <- exp_data_filtered$gene_id
# create a DGEList for normalization
d = DGEList(counts = filtered_data_matrix, group = samples$treatment)
# calculate normalization factor
d_TMM = calcNormFactors(d)
normalized_counts <- cpm(d_TMM)
# number of outlier filtered
nrow(exp_data_filtered) - nrow(normalized_counts)
# map my ensembl id to hgnc symbol using the correct mart and biomart filter,
conversion_mda <- "mda_cell_line.rds"
if (file.exists(conversion_mda)) {
mda_cell_line <- readRDS(conversion_mda)
} else {
mda_cell_line <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id",
value = exp_data_filtered$gene_id, mart = ensembl)
saveRDS(mda_cell_line, conversion_mda)
}
## [1] 15746
## [1] 16953
# merge to include rest of the data
normalized_counts_merge <- merge(mda_cell_line, normalized_counts, by.x = 1, by.y = 0,
all.y = TRUE)
# how many are missing
kable(normalized_counts_merge[1:5, 1:5], type = "html", caption = "Table 3 : missing hgnc symbols")
ensembl_id_missing_gene <- normalized_counts_merge$ensembl_gene_id[which(is.na(normalized_counts_merge$hgnc_symbol))] #1207
no_symbol = length(ensembl_id_missing_gene)
# old hgnc from my original dataset
old_mapping <- merge(exp_data_filtered[, 1:2], data.frame(ensembl_id_missing_gene),
by.x = 2, by.y = 1)
# some of the old hgnc symbol
kable(old_mapping[1:10, ], type = "html", caption = " Table 4: unmapped ensembl id found in original dataset")
# how many ribosomal protein there are
count = nrow(old_mapping[grep(old_mapping$gene_name, pattern = "^RP"), ])
I merge the data set and 1207 observations are missing, 258are ribosomal genes but it is not really important for this dataset. The coverage now around 95%
# missing hgnc symbol
missing_ids_subset <- normalized_counts_merge[which(is.na(normalized_counts_merge$hgnc_symbol)),
]
# merge the missing symbol with old mapping
missing_ids_subset_withids <- merge(old_mapping, missing_ids_subset, by.x = 1, by.y = 1)
# filter out the NAs caused by the merge
missing_ids_subset_withids <- missing_ids_subset_withids[-3]
# set the same column names
colnames(missing_ids_subset_withids)[1:2] <- colnames(normalized_counts_merge)[1:2]
# bind the old mapping with the rows that have missing hgnc symbol
finalized_normalized_counts <- rbind(normalized_counts_merge[which(!is.na(normalized_counts_merge$hgnc_symbol)),
], missing_ids_subset_withids)
# drop some NAs
finalized_normalized_counts <- normalized_counts_merge %>%
drop_na(.)
# check and filter duplicated ensembl id
length(unique(finalized_normalized_counts$ensembl_gene_id))
length(unique(finalized_normalized_counts$hgnc_symbol))
finalized_normalized_counts <- distinct(finalized_normalized_counts, ensembl_gene_id,
.keep_all = TRUE)
any(is.na(finalized_normalized_counts$hgnc_symbol))
# write.table(finalized_normalized_counts, file = file.path(getwd(),'Data',
# 'GSE152201_finalized_normalized_counts_mda_2022.txt'), sep = '\t')
In the last assignment, I was working with gene expression data pulling from Geo with id GSE152201. This dataset describes the experiment done for two breast cancer cell lines; HCC70 and MDA. HCC70 is a basal cell line while MDA is a meschemyal cell line. Each cell line was treated with a control(ETOH), and a treatment(DEX) with three replicates each. DEX is a inducer of GR, the experiment is trying to find out what genes are upreguated by this inducer. Complemented by other experiments, they ultimately wanted to show that GR coordinates with STAT3 to active gene expression signature for basal-like triple-negative breast cancer.(Conway et al. 2020)
For this dataset,I filtered the gene that has count per million less than 3 for each cell line, normalizing by treatment using EdgeR. I also mapped the ensembl gene id to HGNC symbol using Biomart and merged the HGNC symbol that was originally existed in my dataset to the unmapped HGNC symbol.The final coverage above 95%
Since this data contains two cell lines, I am just going to perform one differential expression for the MDA cell line.
knitr::opts_chunk$set(warning = FALSE, message = FALSE, echo = TRUE, results = "hide",
tidy = TRUE)
All packages references in A2 (Morgan 2021; Gu et al. 2014; Gu, Eils, and Schlesner 2016; Ritchie et al. 2015; Xie 2021; Xie, Cheng, and Tan 2022; Kolberg et al. 2020; Huber et al. 2015; Wickham et al. 2019; Isserlin 2022d, 2022b)
# rename for convinence
normalized_count_data <- finalized_normalized_counts
# select the normalized data
heatmap_matrix <- normalized_count_data[, 3:8]
# rename the column and row names
rownames(heatmap_matrix) <- normalized_count_data$ensembl_gene_id
colnames(heatmap_matrix)
colnames(heatmap_matrix) <- colnames(normalized_count_data[, 3:8])
# row normalized the data
heatmap_matrix <- t(scale(t(heatmap_matrix)))
# some NAs, could be due to 0 variance
heatmap_matrix <- data.frame(heatmap_matrix)
# store 104 NAs
MDA_heatmap_missing <- heatmap_matrix %>%
filter_all(any_vars(is.na(.)))
# filter 104 NAs
heatmap_matrix <- heatmap_matrix %>%
drop_na(.)
# set up the color gradient
if (min(heatmap_matrix) == 0) {
heatmap_col = colorRamp2(c(0, max(heatmap_matrix)), c("white", "red"))
} else {
heatmap_col = colorRamp2(c(min(heatmap_matrix), 0, max(heatmap_matrix)), c("blue",
"white", "red"))
}
# plot a heatmap
current_heatmap <- Heatmap(as.matrix(heatmap_matrix), show_row_dend = TRUE, show_column_dend = TRUE,
col = heatmap_col, show_column_names = TRUE, show_row_names = FALSE, show_heatmap_legend = TRUE)
current_heatmap
This heat map show there is some separation of signals but is not really clear. (Gu, Eils, and Schlesner 2016; Gu et al. 2014; Isserlin 2022d)
Since DEX is an inducer of the GR(NR3C1) gene. I would assume it would be significantly up-regulated. Here I ran a t-test to test this and it is significant
# get the column indexes for control and treatment group
etoh_samples <- grep(colnames(normalized_count_data), pattern = "EtoH")
dex_samples <- grep(colnames(normalized_count_data), pattern = "DEX")
# Find gene of interest, GR(NR3C1)
gr_gene_of_interest <- which(normalized_count_data$hgnc_symbol == "NR3C1")
# separate the data into control and treatment sample based on the given column
# indexes
gr_etoh_samples <- t(normalized_count_data[gr_gene_of_interest, etoh_samples])
colnames(gr_etoh_samples) <- c("etoh_samples")
# gr_etoh_samples
gr_dex_samples <- t(normalized_count_data[gr_gene_of_interest, dex_samples])
colnames(gr_dex_samples) <- c("dex_samples")
# gr_dex_samples
# Run a t test t.test(x=t(gr_etoh_samples),y=t(gr_dex_samples))
sjPlot::tab_model(t.test(x = t(gr_etoh_samples), y = t(gr_dex_samples)), title = "Table 1 : T test result for GR(NR3C1) gene")
| Dependent variable | ||
|---|---|---|
| estimate | CI | p |
| 39.17 | 15.24 – 63.10 | 0.018 |
# extract useful information from the column names
samples <- data.frame(lapply(colnames(normalized_count_data)[3:8], FUN = function(x) {
unlist(strsplit(x, split = "_"))[c(1, 2, 4)]
})) #?
colnames(samples) <- colnames(normalized_count_data)[3:8]
rownames(samples) <- c("cell_line ", "treatments", "replicates")
samples <- data.frame(t(samples))
knitr::kable(samples[1:6, ], output = "html", caption = "Table 2: Showing for ecah sample, which cell line, which replicats, and what condition")
| cell_line. | treatments | replicates | |
|---|---|---|---|
| MDA231_DEX_4hr_rep1_SL35876 | MDA231 | DEX | rep1 |
| MDA231_DEX_4hr_rep2_SL35877 | MDA231 | DEX | rep2 |
| MDA231_DEX_4hr_rep3_SL35878 | MDA231 | DEX | rep3 |
| MDA231_EtoH_4hr_rep1_SL35879 | MDA231 | EtoH | rep1 |
| MDA231_EtoH_4hr_rep2_SL35880 | MDA231 | EtoH | rep2 |
| MDA231_EtoH_4hr_rep3_SL35881 | MDA231 | EtoH | rep3 |
(Xie 2021)
The position of control(ETOH) and treatment(DEX) samples on the PCA plot is mostly apart.So I would use a simple model to fit
# 2D representation of my data, serve as reference for model building later
plotMDS(heatmap_matrix, col = c(rep("blue", 3), rep("red", 3)))
legend("topright", c("ETOH", "DEX"), pch = 19, fill = c("blue", "red"))
title(main = " Figure 2 : Principle Component Analysis Plot", sub = "MAD cell line")
From the plot we can see there is clear seperation of the two groups, so I will fit a simple model Isserlin (2022d)
# filter duplicated row name (RNA,rRNA)
duplicate_num <- length(normalized_count_data$hgnc_symbol) - length(unique(normalized_count_data$hgnc_symbol))
# filter duplicated symbil
normalized_count_data <- distinct(normalized_count_data, hgnc_symbol, .keep_all = TRUE) # 15512 rows left
# create a design matrix
model_design <- model.matrix(~samples$treatments)
knitr::kable(model_design[1:5, ], type = "html", caption = "model design") # not showm\n in knitted version
# convert dataframe to matrix
expressionMatrix <- as.matrix(normalized_count_data[, 3:8])
# assign row names and column names
rownames(expressionMatrix) <- normalized_count_data$ensembl_gene_id
colnames(expressionMatrix) <- colnames(normalized_count_data)[3:8]
# create a exprssion set
minimalSet <- ExpressionSet(assayData = expressionMatrix)
# fit the model
fit <- lmFit(minimalSet, model_design)
There a 1282 duplicated hgnc id in my file, those are likely due to duplicated hgnc id I kept for my first assignment.
# use empirical bays to find genes that are differentially expressed trend =
# TRUE is specific for RNA-seq data
fit2 <- eBayes(fit, trend = TRUE)
# get the top-hits after correction for multiple hypothesis testing
topfit <- topTable(fit2, coef = ncol(model_design), adjust.method = "BH", number = nrow(expressionMatrix))
# merge the hgnc symbol by ensembl id
output_hits <- merge(normalized_count_data[, 1:2], topfit, by.y = 0, by.x = 1, all.y = TRUE)
# order the p value from high to low
output_hits <- output_hits[order(output_hits$P.Value), ]
knitr::kable(output_hits[1:10, 2:8], type = "html", row.names = FALSE, caption = "Table 4: Simple model fit differential gene expression result using empirica bays")
# number of genes pass un-adjusted threshold
num_p_val <- length(which(output_hits$P.Value < 0.05))
# number of genes pass adjusted threshold
num_adj_p <- length(which(output_hits$adj.P.Val < 0.05))
# get the fit for my gene of interest
gr <- output_hits[which(output_hits$hgnc_symbol == "NR3C1"), ]
DT::datatable(gr, option = list(scrollX = T), caption = "Table 5: model fit result for GR gene")
| hgnc_symbol | logFC | AveExpr | t | P.Value | adj.P.Val | B |
|---|---|---|---|---|---|---|
| NUDT16 | -142.384330 | 105.054512 | -58.43513 | 0 | 1.84e-05 | 0.7574880 |
| SCNN1G | -5.687960 | 2.843980 | -57.16498 | 0 | 1.84e-05 | 0.7560911 |
| TSC22D3 | -169.955077 | 91.069378 | -55.15901 | 0 | 1.84e-05 | 0.7536872 |
| MT1X | -6.336576 | 4.173485 | -55.15554 | 0 | 1.84e-05 | 0.7536829 |
| NT5DC3 | -160.940736 | 147.873895 | -54.37663 | 0 | 1.84e-05 | 0.7526775 |
| MT1E | -157.968254 | 95.207152 | -53.09369 | 0 | 1.84e-05 | 0.7509250 |
| PXYLP1 | -15.922962 | 9.317015 | -50.35502 | 0 | 2.12e-05 | 0.7467310 |
| BIRC3 | -219.243719 | 133.252984 | -47.92206 | 0 | 2.46e-05 | 0.7423938 |
| MT2A | -208.178459 | 156.147866 | -46.56372 | 0 | 2.57e-05 | 0.7396751 |
| FKBP5 | -1137.546636 | 629.001346 | -42.94957 | 0 | 3.65e-05 | 0.7311629 |
(Xie, Cheng, and Tan 2022; Huber et al. 2015; Ritchie et al. 2015; Isserlin 2022d)
There are 3691 genes pass threshold (p value < 0.05) There are 1109 genes pass correction (adjust p value <0.05) Since I conducts thousands of statistical test,I used Benjamini-Hochberg(BH) for multiple hypothesis testing correction because BH is not as strict as bonferroni so I would not get all my genes filtered out and it correct false positives. I use 0.05 as my threshold because 0.05 is a general and common p value for statistically analysis. The purpose of this experiment is to identify all genes that are activated by the addition of DEX. If I do not corrected for mulitple hypothesis testing, there are 4000 more genes up-regulated because of the DEX. This seems a lot to me so I would use the adjusted p value with a threshold of 0.05
# create a new column to show genes after correction that are up-regulated,
# down-regulated, and no change
output_hits$diffexpressed <- "no"
output_hits$diffexpressed[output_hits$logFC > 0 & output_hits$adj.P.Val < 0.05] <- "up"
output_hits$diffexpressed[output_hits$logFC < 0 & output_hits$adj.P.Val < 0.05] <- "down"
output_hits$diffexpressed[output_hits$hgnc_symbol == "NR3C1"] <- "NR3C1"
# use for plotting to allow my gene of interest to be on the top layer
df_layer_2 <- output_hits[output_hits$diffexpressed == "NR3C1", ]
# plot a volcano plot
ggplot() + geom_point(data = output_hits, aes(x = logFC, y = -log10(adj.P.Val), col = diffexpressed)) +
geom_point(data = df_layer_2, aes(x = logFC, y = -log10(adj.P.Val), col = "NR3C1")) +
theme_minimal() + xlim(-50, 50) + labs(title = " Figure 3: Volcano plot for differential expression")
# filter the significant hits
top_hits <- output_hits$ensembl_gene_id[output_hits$P.Value < 0.05]
# create a heatmap matrix for genes that are significant
heatmap_matrix_tophits <- t(scale(t(heatmap_matrix[which(rownames(heatmap_matrix) %in%
top_hits), ])))
# set up the color gradient
if (min(heatmap_matrix_tophits) == 0) {
heatmap_col = colorRamp2(c(0, max(heatmap_matrix_tophits)), c("white", "red"))
} else {
heatmap_col = colorRamp2(c(min(heatmap_matrix_tophits), 0, max(heatmap_matrix_tophits)),
c("blue", "white", "red"))
}
# plot a heatmap for only those genes that are significant
current_heatmap <- Heatmap(as.matrix(heatmap_matrix_tophits), cluster_rows = TRUE,
cluster_columns = TRUE, show_row_dend = TRUE, show_column_dend = TRUE, col = heatmap_col,
show_column_names = TRUE, show_row_names = FALSE, show_heatmap_legend = TRUE,
)
current_heatmap
(Gu, Eils, and Schlesner 2016; Gu et al. 2014; Isserlin 2022d)
After we select for genes that are significant, the heatmap shows a more clear picture of the experiment. This is because the experiment might be well-performed to eliminate any variation that is not resulted from the manipulation. It is also shown by the PCA plot that the control and treatment is well-separated. Gene sets are up-regulated in the control sample are down-regulated in the treatment group. Gene sets are down-regulated in the control sample are up-regulated in the treatment group.
let us not cluster the genes but group controls and experiments
top_hits <- output_hits$ensembl_gene_id[output_hits$adj.P.Val < 0.05]
heatmap_matrix_tophits <- t(scale(t(heatmap_matrix[which(rownames(heatmap_matrix) %in%
top_hits), ])))
if (min(heatmap_matrix_tophits) == 0) {
heatmap_col = colorRamp2(c(0, max(heatmap_matrix_tophits)), c("white", "red"))
} else {
heatmap_col = colorRamp2(c(min(heatmap_matrix_tophits), 0, max(heatmap_matrix_tophits)),
c("blue", "white", "red"))
}
# plot heatmap that orders columns based on what I have given (do not cluster
# columns)
current_heatmap <- Heatmap(as.matrix(heatmap_matrix_tophits), cluster_rows = TRUE,
cluster_columns = TRUE, show_row_dend = TRUE, show_column_dend = FALSE, col = heatmap_col,
show_column_names = TRUE, show_row_names = FALSE, show_heatmap_legend = TRUE,
)
current_heatmap
(Gu, Eils, and Schlesner 2016; Gu et al. 2014; Isserlin 2022d)
I decided to use the gprofiler2 package for enrichment analysis becuase it would be cohesive with my rest of code and gprofiler2 actually come with some nice graph I used Benjamini-Hochberg(BH) for multiple hypothesis testing correction there are thousand of pathway. BH is not as strict as bonferroni so I would not get all pathway out and it corrects false positives.
I choose GO biological processes , Reactome and Wikipathway.I choose those becuase I am interested in biological pathway annotation those three are popular and free. I want my initial analysis to be high quality so I exclude electronic version of GO. I am using the following version as shown in the gprofiler web server-Data sources - Show data versions GO:BP – annotations: BioMart, classes:releases/2021-12-15 REAC – annotations: BioMart, classes: 2022-1-3 WP – 2021-12-10
Some code in this section references a publication about gprofiler2(Kolberg et al. 2020)
# up-regulated subset
up_subset <- output_hits[which(output_hits$adj.P.Val < 0.05 & output_hits$logFC >
0), ]
# down-regulated subset
down_subset <- output_hits[which(output_hits$adj.P.Val < 0.05 & output_hits$logFC <
0), ]
# whole list
whole <- output_hits[which(output_hits$adj.P.Val < 0.05), ]
# enrichment analysis for up regulated subset
gp_up = gost(up_subset$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"))
# link to the gprofiler server
gp_up_link = gost(up_subset$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"), as_short_link = TRUE)
# filter based on term size and order by p value
gp_up$result -> df_up
df_up <- df_up[which(df_up$term_size < 200), ]
df_up_ordered <- df_up[order(df_up$p_value), ]
up_genesets <- nrow(df_up)
# construct a table about enrichment results
DT::datatable(df_up_ordered[, c(2, 3, 4, 9, 10, 11)], caption = "Enrichment analysis for up-regulated genes",
options = list(scrollX = T))
# plot the enrichment result
p1 <- gostplot(gp_up, interactive = FALSE)
# highlight top ten hits
publish_gostplot(p1, highlight_terms = df_up_ordered$term_id[1:10])
Figure 6: Enrichment plot for up-regulated genes
Figure 6: Enrichment plot for up-regulated genes
# publish_gosttable(gp_up)
# for down regulated subset
gp_down = gost(down_subset$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"))
# link
gp_down_link = gost(down_subset$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"), as_short_link = TRUE)
# filter based on term size and order by p value
gp_down$result -> df_down
df_down <- df_down[which(df_down$term_size < 200), ]
df_down_ordered <- df_down[order(df_down$p_value), ]
down_genesets <- nrow(df_down)
# construct a table about enrichment results
DT::datatable(df_down_ordered[, c(2, 3, 4, 9, 10, 11)], caption = " Enrichment analysis for down-regulated genes",
options = list(scrollX = T))
# plot the enrichment result
p2 <- gostplot(gp_down, interactive = FALSE)
# highlight top ten hits
publish_gostplot(p2, highlight_terms = df_down_ordered$term_id[1:10])
Figure 7: Enrichment plot for down-regulated genes
Figure 7: Enrichment plot for down-regulated genes
# for the whole set
gp_whole = gost(whole$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"))
gp_whole_link = gost(whole$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"), as_short_link = TRUE)
# filter based on term size and order by p vlaue
gp_whole$result -> df_whole
df_whole <- df_whole[which(df_whole$term_size < 200), ]
df_whole_ordered <- df_whole[order(df_whole$p_value), ]
whole_genesets <- nrow(df_whole)
# construct a table about enrichment results
DT::datatable(df_whole_ordered[, c(2, 3, 4, 9, 10, 11)], caption = " Enrichment analysis for down-regulated genes",
options = list(scrollX = T))
# plot the enrichment result
p3 <- gostplot(gp_whole, interactive = FALSE)
# highlight top ten hits
publish_gostplot(p3, highlight_terms = df_whole_ordered$term_id[1:10])
Xie, Cheng, and Tan (2022)
For term size between 0 and 200, threshold of 0.05 after BH correction. There are 149 up-regulated genesets. There are 185 down-regulated genesets. There are a total 417 genesets using the whole list.
Comparing enrichment analysis result from using own up or down-regulated gene list versus using the whole list I see there are some difference in the top hit returned. Using the whole gene list gave us some new enrichment results such as MAPK cascade and miRNA transcription. MAPK cascade and miRNA all have regulatory role in breast cancer progression(Jiang et al. 2020 ; Loh et al. 2019). This is likely due the reason that some particular genes are highly expressed while other genes in the same geneset were not. However, in combination they gave a stronger signal.
Yes, the enrichment analysis for up regulated genes supports the result of the paper that GR actives gene expression signature for basal-like triple-negative breast cancer. This is because among the 10 significant genesets in the enrichment analysis, NF-kB stimulates proliferation and stops programmed cell death in many cell types including basal breast cancer cells(Biswas et al. 2004 ). Also,receptor tyrosine kinases is an important signalling pathway that is involved in breast cancer(Hynes 2000)
Because MDA is a stem cell lins. There are some other top hits are related to stem cell characteristics. However, because here I just analyze differential expression after the induction of GR not STAT3, My analysis cannot confidently support the final conclusion of the paper that STAT3 and GR coordinates to active basal-like triple-negative breast cancer.